Half-way seminar: Bayesian Inference for Models of Text Data

Isac Boström (Chalmers)

Wed Mar 18, 12:15-13:00 (7 days from now)
Lecture held in MVL14.

Abstract: Models of text data are increasingly applied to inference tasks in the social sciences to investigate a wide range of linguistic and cultural phenomena. Word embeddings, for example, are commonly used to study semantic change, political language, and social bias in large collections of text. However, these models are typically estimated by optimization, producing point estimates without principled uncertainty quantification.

In this talk, I present a Bayesian formulation of probabilistic word embedding models, focusing on skip-gram with negative sampling and briefly discussing continuous bag-of-words. I explain why the posterior distribution is non-identifiable under general linear transformations of the embedding space and introduce a simple and principled constraint that ensures a well-defined posterior. I then compare different approaches to posterior inference, including mean-field variational inference, Hamiltonian Monte Carlo, and Pólya-Gamma Gibbs sampling. By augmenting the likelihood with Pólya-Gamma latent variables, we obtain an efficient sampler that provides scalable and well-calibrated uncertainty quantification.

I will also briefly discuss the structural topic model as a related example where Bayesian uncertainty plays a central role.

machine learningprobabilitystatistics theory

Audience: researchers in the discipline


Gothenburg statistics seminar

Series comments: Gothenburg statistics seminar is open to the interested public, everybody is welcome. It usually takes place in MVL14 (http://maps.chalmers.se/#05137ad7-4d34-45e2-9d14-7f970517e2b60, see specific talk). Speakers are asked to prepare material for 35 minutes excluding questions from the audience.

Organizers: Akash Sharma*, Helga Kristín Ólafsdóttir*, Kasper Bågmark*
*contact for this listing

Export talk to